log_line_prefix: make it possible to add the search_path

  • Jump to comment-1
    p.psql@pinaraf.info2022-07-25T07:37:52+00:00
    Hello I'm working on several databases where schemas are used to differentiate the tenants. This is great for performance, but several tools are lacking around this usecase by not showing the schema, one of them being log_line_prefix. It is possible to work around this using the application_name, but a mistake on the application side would be fatal, while the search_path would still indicate the real tables used in a query. The attached patch implements this, using %S. I've not written the documentation yet, since I'm not sure this would be acceptable as is, or if a more "generic" method should be used (I thought of %{name} to fetch an arbitrary GUC, but did not implement due to a lack of need for that feature)
    • Jump to comment-1
      lukas@fittl.com2022-07-26T01:08:01+00:00
      On Mon, Jul 25, 2022 at 12:38 AM Pierre Ducroquet <p.psql@pinaraf.info> wrote: > usecase by not showing the schema, one of them being log_line_prefix. > It is possible to work around this using the application_name, but a > mistake > on the application side would be fatal, while the search_path would still > indicate the real tables used in a query. > I'm assuming this is mostly referring to STATEMENT log lines and other situations where the original query is output (e.g. auto_explain). +1 on the benefit of solving this (I've had this use case before), but I think we can keep this more specific than a general log_line_prefix option. The search_path isn't relevant to any log line that doesn't reference a query, since e.g. autovacuum log output fully qualifies its relation names, and many other common log lines have nothing to do with tables or queries. What if we instead had something like this, as an extra CONTEXT (or DETAIL) log line: LOG: duration: 4079.697 ms execute <unnamed>: SELECT * FROM x WHERE y = $1 LIMIT $2 DETAIL: parameters: $1 = 'long string', $2 = '1' CONTEXT: settings: search_path = 'my_tenant_schema, "$user", public' That way you could determine that the slow query was affecting the "x" table in "my_tenant_schema". This log output would be controlled by a new GUC, e.g. "log_statement_search_path" with three settings: (1) never, (2) non_default, (3) always. The default would be "never" (same as today). "non_default" would output the search path when a SET has modified it in the current session (and so we couldn't infer it from the config or the role/database overrides). "always" would always output the search path for statement-related log lines. Thanks, Lukas -- Lukas Fittl
      • Jump to comment-1
        p.psql@pinaraf.info2022-07-26T12:41:55+00:00
        On Tuesday, July 26, 2022 3:08:01 AM CEST Lukas Fittl wrote: > On Mon, Jul 25, 2022 at 12:38 AM Pierre Ducroquet <p.psql@pinaraf.info> > > wrote: > > usecase by not showing the schema, one of them being log_line_prefix. > > It is possible to work around this using the application_name, but a > > mistake > > on the application side would be fatal, while the search_path would still > > indicate the real tables used in a query. > > I'm assuming this is mostly referring to STATEMENT log lines and other > situations where the original query is output (e.g. auto_explain). > > +1 on the benefit of solving this (I've had this use case before), but I > think we can keep this more specific than a general log_line_prefix option. > The search_path isn't relevant to any log line that doesn't reference a > query, since e.g. autovacuum log output fully qualifies its relation names, > and many other common log lines have nothing to do with tables or queries. > > What if we instead had something like this, as an extra CONTEXT (or DETAIL) > log line: > > LOG: duration: 4079.697 ms execute <unnamed>: > SELECT * FROM x WHERE y = $1 LIMIT $2 > DETAIL: parameters: $1 = 'long string', $2 = '1' > CONTEXT: settings: search_path = 'my_tenant_schema, "$user", public' > > That way you could determine that the slow query was affecting the "x" > table in "my_tenant_schema". > > This log output would be controlled by a new GUC, e.g. > "log_statement_search_path" with three settings: (1) never, (2) > non_default, (3) always. > > The default would be "never" (same as today). "non_default" would output > the search path when a SET has modified it in the current session (and so > we couldn't infer it from the config or the role/database overrides). > "always" would always output the search path for statement-related log > lines. > > Thanks, > Lukas Hi This is a good idea. I've hacked a first implementation of it (lacking documentation, and several logs are still missing) attached to this email. The biggest issue I had was with knowing where the setting come from since no guc.h function expose that information. I worked around this a bit, but I'm sure it would be preferable to do it otherwise. Thanks for your feedbacks Regards Pierre
    • Jump to comment-1
      alvherre@alvh.no-ip.org2022-07-25T09:52:41+00:00
      On 2022-Jul-25, Pierre Ducroquet wrote: > This is great for performance, but several tools are lacking around this > usecase by not showing the schema, one of them being log_line_prefix. > The attached patch implements this, using %S. I've not written the > documentation yet, since I'm not sure this would be acceptable as is, or if a > more "generic" method should be used (I thought of %{name} to fetch an > arbitrary GUC, but did not implement due to a lack of need for that feature) It seems that this would be too noisy to be truly usable. What if we emitted a log line when the variable changed, and the value that's in use when the connection starts? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
      • Jump to comment-1
        p.psql@pinaraf.info2022-07-25T13:38:55+00:00
        On Monday, July 25, 2022 11:52:41 AM CEST Alvaro Herrera wrote: > On 2022-Jul-25, Pierre Ducroquet wrote: > > This is great for performance, but several tools are lacking around this > > usecase by not showing the schema, one of them being log_line_prefix. > > > > The attached patch implements this, using %S. I've not written the > > documentation yet, since I'm not sure this would be acceptable as is, or > > if a more "generic" method should be used (I thought of %{name} to fetch > > an arbitrary GUC, but did not implement due to a lack of need for that > > feature) > It seems that this would be too noisy to be truly usable. What if we > emitted a log line when the variable changed, and the value that's in > use when the connection starts? Then the log files would be filled by these messages, only to be able to make use of the few slow queries that are logged during the day. Or it would need to be a log before each slow query. I'm not sure how well it would work.
        • Jump to comment-1
          alvherre@alvh.no-ip.org2022-07-25T16:06:28+00:00
          On 2022-Jul-25, Pierre wrote: > On Monday, July 25, 2022 11:52:41 AM CEST Alvaro Herrera wrote: > > It seems that this would be too noisy to be truly usable. What if we > > emitted a log line when the variable changed, and the value that's in > > use when the connection starts? > > Then the log files would be filled by these messages, only to be able to make > use of the few slow queries that are logged during the day. Ah, yeah, that's not useful for that case ... > Or it would need to be a log before each slow query. I'm not sure how > well it would work. ... and this would probably be prohibitively complex to implement and use, as well as too slow for the high traffic case. Maybe your idea of allowing arbitrary GUCs is not a bad one, something like %{search_path}G (where we add a letter at the end just so we can add other things in the future that aren't GUCs.) -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
          • Jump to comment-1
            tgl@sss.pgh.pa.us2022-07-25T20:43:01+00:00
            Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Maybe your idea of allowing arbitrary GUCs is not a bad one, something > like > %{search_path}G > (where we add a letter at the end just so we can add other things in the > future that aren't GUCs.) I'm pretty uncomfortable about the amount of code that could potentially be reached during an error logging attempt if we do that. regards, tom lane